Skip to content

Dev#7

Merged
haizhongzheng merged 5 commits into
mainfrom
dev
May 29, 2026
Merged

Dev#7
haizhongzheng merged 5 commits into
mainfrom
dev

Conversation

@haizhongzheng
Copy link
Copy Markdown
Member

merge dev to main.

WWWjiahui and others added 5 commits May 28, 2026 22:31
Upgrade the inference/runtime stack to the latest sglang and the
dependency versions it requires, validated end-to-end on the FSDP
backend (qwen3-1.7b math example, 2x L40).

Version pins (pyproject.toml, docs, Docker):
- sglang 0.5.5.post1 -> 0.5.12.post1
- torch 2.8.0 -> 2.11.0; torch_memory_saver 0.0.9 -> 0.0.9.post1
- transformers 4.57.1 -> 5.6.1 (sglang pins ==5.6.0, which has a
  flash-attention s_aux=None crash for non-sink models; 5.6.1 is the
  upstream patch release. Forced via [tool.uv] override-dependencies,
  which requires uv >= 0.10 -- documented in installation.md)
- peft -> >=0.18.0 (required by transformers 5.x)
- CUDA base image 12.9.1 -> 13.0.0

sglang 0.5.12 API compatibility:
- remove LoRAAbortReleasePatch (the abort-path lora_registry.release()
  it added is now fixed upstream; keeping it would double-release)
- remove enable_ep_moe from SGLangConfig (field dropped from ServerArgs)
- kernel package rename sgl_kernel -> sglang_kernel in the installation
  validator

transformers 5.x / sglang 0.5.12 runtime fixes (surfaced by the run):
- rlvr workflow: apply_chat_template now returns a BatchEncoding; pass
  return_dict=False to get the flat list[int] the rollout path expects
- fsdp apply_fsdp2: model._no_split_modules is a set in transformers 5.x;
  coerce to list before indexing
- raas free-port range capped at 55535 so sglang's derived gRPC port
  (port + 10000) stays <= 65535

Scope: FSDP backend only. Megatron / VL paths are intentionally not
covered here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
chore: bump sglang 0.5.5.post1 -> 0.5.12.post1 (FSDP path)
sglang 0.5.12's /health round-trips through the scheduler, which stays
saturated for ~30-40s during the initial unchunked prefill of ~2048
requests/engine. The old 3-strike / 30s watchdog (5s probe timeout)
hard-exited a busy-but-alive engine before the first rollout batch
completed, hanging the rollout pipeline at step 0.

Raise the /health probe timeout 5s -> 20s so a slow-but-alive endpoint
isn't marked failed, and the failure budget 3 -> 5 strikes. A crashed
engine refuses connections instantly, so real-death detection stays
~50s (worst case ~100s) while the prefill ramp is tolerated. Verified:
math and code qwen3-8b-m2po-delta recipes train through the ramp with
zero watchdog strikes.
…ution

Two from-scratch install blockers with the sglang 0.5.12 / torch 2.11
stack:

- sglang 0.5.12 depends on flash-attn-4>=4.0.0b9 (a pre-release pulled in
  as a dependency), so resolution fails unless pre-releases are allowed.
  Add prerelease = "allow" to [tool.uv] so `uv pip install -e ".[sglang]"`
  resolves on both the conda and Docker paths.

- flash-attn 2.8.3 builds from source; nvcc writes GBs of intermediates to
  $TMPDIR. When $TMPDIR is a small/NFS-quota'd home the build fails with
  "nvFatbin error: empty input" / "Disk quota exceeded" from truncated
  temps. Document setting CUDA_HOME and a roomy TMPDIR, switch the sglang
  step to the project-extra form, and clarify flash-attn (FA2, trainer) vs
  flash-attn-4 (pulled in by sglang).
sglang requires an unbounded "kernels", so uv resolved the latest (0.15),
but transformers 5.6.1 only supports kernels<0.13 — its hub_kernels module
constructs LayerRepository() without a revision/version, which kernels 0.15
rejects, so `import sglang` crashes with "Either a revision or a version
must be specified." Pin to the range transformers 5.6.1 expects (0.12.x).
Verified on a from-scratch env: kernels resolves to 0.12.3 and the math
recipe trains.
@haizhongzheng haizhongzheng merged commit 6145e22 into main May 29, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants